Here are my top 10 dataviz 'static' plots from 2016. All plots were generated with R graphics, and graphics packages from CRAN, Bioconductor and github.
Data here, data there data everywhere! Mountain of data, by itself, without visualization and context can not adequately provide meaningful insight. When it comes to data visualization, 2016 was a breakout year for me. There was massive amount of lessons in preparing, piloting and modeling data. It was challenging, frustrating, exhilarating, and dare I say, fun all a the same time. I created about 12 blogs**, published 4 documents on Rpub**, 1 Kaggle ‘kernel’, 11 shiny apps (no all deployed), several interactive and static maps. By no means I done it all or know it all, there is a lot more to learn!
Selection and ordering criteria is not scientific, I simply chose based of my own satisfaction level after the plot is rendered . The top 10 static data visualizations listed below were created for data projects I undertook in 2016. I have also added runner up data driven plots/maps/graphs at the end. Will add dynamic top 10 plots and favorite web application created for 2016 in separate blogs.
Majority of my time was spent learning, and applying what I learned, wrangling/cleaning/tydiying data. If you don’t love this part of it, it would make you question your existence. Fortunately I do! I can confirm that cleaning, in deed, takes 80% of the effort of organizing and arranging data! Once tidy data is achieved (“variable in columns/observations in rows”), applying statistics summarizations, modeling and visualization can become easier. Note: All of the figures listed here are in png format. The original including the codes are found in the blogs and or my github repo.
Without further ado - here are my static top 10 data visualization plot/grph/maps for 2016
This plot was generated for analysis identifying math and science score performance for black female students in specific socioeconomic groups.
This plot uses three categorical data in one visual to shows the black female students score and the statistical trend where there is sufficient data.
I participated in the US election in 2016 working in a phone banks and canvasing. I generated dummy data and created this slide that show the daily count voter that will need to be called back based on their request, or other factors. It identify voters based on gender, race and age.
This visual is a follow up to Number 8. It counts voters response (yes, not sure, no-response, no, call back) and brakes down the count by ethnicity, it also aggregates each category with light yellow bar. Here I layered png image and layered the two data frames for the bar charts. ggplot2 is the star here.
I am currently working on a project that digs into the prevalence of cancer int eh US. Center for disease Control keeps statistics for all types of cancer based on Metropolitan Statistical Area’s of the US (MSA). This plot layers was created converting the the MSA shape from US Census to geojson and layering the geojson shapes on top of a US tile from openstreetmaps.
The following is re-creation of an example from the help section, I like the regional brake down and coloring of the continent. tmap is the star here.
This plot shows the 8 wards of the Washington, DC with json shapefiles layered on top of the opestreetmap tile.
I took this picture with samsung smart phone fall of 2016. With deep colors it is rich with pixels. Read the image into R with a EBImage bioconductor package and converted the image into data frame resulting in 9.5 million pixels. This is big data above and beyond for a personal computer. plotting and R graphics rendered the image, one pixel at a time, only after 25 minuets or so the first time.
Generated this plot for Kaggle “kernell”, it packs four plots into one and used the new ggplot feature for annotation. The plot is a statistics for aircraft accident in the US from NTSB database. Also used factor order using forcats package for the bar charts. Created a theme and got fancy with fonts and colors.
This plot was generated while trying to plot a million taxi drop off points in NY city on a small foot print.
No order or preference is here.
Random Forest Tree plot. Package used:
Font and color testing with xckd package. Line chart with Subset of gapminder data.
Choroplethr population distribution map for DC, Virginia, Virginia
DMV Airbnb space categorical locations
This visual counts the cities with the most number of food trucks in the US cities.
Women vs Men salary comparison for college professors.
Although I made the plots, I Want to take a moment to thank all the great minds who have created the packages used to create the plots, answered questions on stack overflow, github and twitter. Most of the heavy leafting was done by them!